A Measure of Term Representativeness Based on the Number of Co-occurring Salient Words
نویسندگان
چکیده
We propose a novel measure of the representativeness (i.e., indicativeness or topic specificity) of a term in a given corpus. The measure embodies the idea that the distribution of words co-occurring with a representative term should be biased according to the word distribution in the whole corpus. The bias of the word distribution in the co-occurring words is defined as the number of distinct words whose occurrences are saliently biased in the co-occurring words. The saliency of a word is defined by a threshold probability that can be automatically defined using the whole corpus. Comparative evaluation clarified that the measure is clearly superior to conventional measures in finding topic-specific words in the newspaper archives of different sizes.
منابع مشابه
ارائه یک روش جدید بازیابی اطلاعات مناسب برای متون حاصل از بازشناسی گفتار
In this article a pre-processing method is introduced which is applicable in speech recognized texts retrieval task. We have a text corpus, t generated from a speech recognition system and a query as inputs, to search queries in these documents and find relevant documents. A basic problem in a typical speech recognized text is some error percentage in recognition. This, results erroneously ass...
متن کاملStudy on Application of Two Different Magnetic Materials in Rotor of Cylindrical Synchronous Generator to Produce Reluctance Torque
Synchronous generators are of two type’s salient pole type and round rotor type. The load angle curve of a cylindrical rotor synchronous machine comprises a single sine term only while in salient pole synchronous generators, power-angle characteristic has two terms. The first term is the fundamental component due to field excitation (the same as the cylindrical rotor) and the second term ...
متن کاملEfficient Co-Salient Video Object Detection Based on Preattentive Processing
Automatic video annotation is a critical step for contentbased video retrieval and browsing. Detecting the focus of interest such as co-occurring objects in video frames automatically can benefit the tedious manual labeling process. However, detecting the co-occurring objects that is visually salient in video sequences is a challenging task. In this paper, in order to detect co-salient video ob...
متن کاملDrawing Word co-occurrence map of Spinal Muscular Atrophy disease
Introduction: The purpose of this article is to evaluate the status of articles in the field of Spinal Muscular Atrophy According to the Scientometrics indices Word co-occurrence map of this field . Methods: The present study is an applied one with a quantitative approach and a descriptive approach. It has been done using scientometrics and the co-occurrence words analysis technique. Document...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002